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Abstract 

Categorical logic has shown that modern logic is essentially the logic of subsets (or "subob- 
jects"). Partitions are dual to subsets so there is a dual logic of partitions where a "distinction" 
[an ordered pair of distinct elements {u,u') from the universe U ] is dual to an "element". An 
element being in a subset is analogous to a partition n onU making a distinction, i.e., if u and u' 
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were in different blocks of vr. Subset logic leads to finite probability theory by taking the (Lapla- 
cian) probability as the normalized size of each subset-event of a finite universe. The analogous 
step in the logic of partitions is to assign to a partition the number of distinctions made by a 
partition normalized by the total number of ordered pairs \U\^ from the finite universe. That 
yields a notion of "logical entropy" for partitions and a "logical information theory." The logical 
theory directly counts the (normalized) number of distinctions in a partition while Shannon's 
theory gives the average number of binary partitions needed to make those same distinctions. 
Thus the logical theory is seen as providing a conceptual underpinning for Shannon's theory 
based on the logical notion of "distinctions." 



1 Towards a Logic of Partitions 

In ordinary logic, a statement P (a) is formed by a predicate P {x) applying to an individual name 
"a" (which could be an n-tuple in the case of relations). The predicate is modeled by a subset Sp 
of a universe set U and an individual name such as "a" would be assigned an individual Ua (z U (an 
n-tuple in the case of relations). The statement P (a) would hold in the model if Ua G Sp. In short, 
logic is modeled as the logic of subsets of a set. Largely due to the efforts of William Lawvere, the 
modern treatment of logic was reformulated and vastly generalized using category theory in what is 
now called categorical logic. Subsets were generalized to subobjects or "parts" (equivalence classes 
of monomorphisms) so that logic has become the logic of subobjectsQ 

There is a duality between subsets of a set and partition^ on a set. "The dual notion (obtained 
by reversing the arrows) of 'part' is the notion of partition.'' [531 P- 85] In category theory, this emerges 
as the reverse-the-arrows duality between monomorphisms (monos), e.g., injective set functions, and 
epimorphisms (epis), e.g., surjective set functions, and between subobjects and quotient objects. If 
modern logic is formulated as the logic of subsets, or more generally, subobjects or "parts", then the 
question naturally arises of a dual logic that might play the analogous role for partitions and their 
generalizations . 

Quite aside from category theory duality, it has long been noted in combinatorial mathematics, 
e.g., in Gian-Carlo Rota's work in combinatorial theory and probability theory [3], that there is a 
type of duality between subsets of a set and partitions on a set. Just as subsets of a set are partially 
ordered by inclusion, so partitions on a set are partially ordered by refinementlfl Moreover, both 
partial orderings are in fact lattices (i.e., have meets and joins) with a top element 1 and a bottom 
element 0. In the lattice of all subsets ViU) (the power set) of a set U, the meet and join are, of 
course, intersection and union while the top element is the universe U and the bottom element is the 
null set 0. In the lattice of all partitions n(J7) on a non-empty set U, there are also meet and join 
operations (defined later) while the bottom element is the indiscrete partition (the "blob" ) where all 
of U is one block and the top element is the discrete partition where each element of C/ is a singleton 
blockQ 

This paper is part of a research programme to develop the general dual logic of partitions. The 
principal novelty in this paper is an analogy between the usual semantics for subset logic and a 
suggested semantics for partition logic; the themes of the paper unfold from that starting point. 



^See 1231 Appendix A for a good treatment. 

^A partition 7r on a set U is usually defined as a mutually exclusive and jointly exhaustive set {B}g^^ of subsets 
or "blocks" B C U. Every equivalence relation on a set U determines a partition on U (with the equivalence classes 
as the blocks) and vice-versa. For our purposes, it is useful to think of partitions as binary relations defined as the 
complement to an equivalence relation in the set of ordered pairs UxU. Intuitively, they have complementary functions 
in the sense that equivalence relations identify while partitions distinguish elements of U. 

^A partition tt more refined than a partition cr, written a ^ tt, ii each block of tt is contained in some block of cr. 
Much of the older literature (e.g., [5] Example 6, p. 2]) writes this relationship the other way around but, for reasons 
that will become clear, we are adopting a newer way of writing refinement (e.g., 1141 ) so that the more refined partition 
is higher in the refinement ordering. 

*Rota and his students have developed a logic for a special type of equivalence relation (which is rather ubiquitous 
in mathematics) using join and meet as the only connectives. [7] 
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Starting with the analogy between a subset of a set and a partition on the set, the analogue to the 
notion of an element of a subset is the notion of a distinction of a partition which is simply an ordered 
pair (u, u') d U xU in distinct blocks of the partitionlfl The logic of subsets leads to finite probability 
theory where events are subsets S* of a finite sample space U and which assigns probabilities Prob (S) 
to subsets (e.g., the Laplacian equiprobable distribution where Prob (5*) = \S\ / \U\). Following the 
suggested analogies, the logic of partitions similarly leads to a "logical" information theory where 
the numerical value naturally assigned to a partition can be seen as the logical information content 
or logical entropy h (tt) of the partition. It is initially defined in a Laplacian manner as the number 
of distinctions that a partition makes normalized by the number of ordered pairs of the universe 
set U. The probability interpretation of h (tt) is the probability that a random pair from U x U is 
distinguished by tt, just as Prob (5) is the probability that a random choice from U is an element of 
S. This logical entropy is precisely related to Shannon's entropy measure [32] so the development of 
logical information theory can be seen as providing a new conceptual basis for information theory 
at the basic level of logic using "distinctions" as the conceptual atoms. 

Historically and conceptually, probability theory started with the simple logical operations on 
subsets (e.g., union, intersection, and complementation) and assigned a numerical measure to subsets 
of a finite set of outcomes (number of favorable outcomes divided by the total number of outcomes). 
Then probability theory "took ofP' from these simple beginnings to become a major branch of pure 
and applied mathematics. 

The research programme for partition logic that underlies this paper sees Shannon's information 
theory as "taking ofP' from the simple notions of partition logic in analogy with the conceptual 
development of probability theory that starts with simple notions of subset logic. But historically. 
Shannon's information theory appeared "as a bolt out of the blue" in a rather sophisticated and 
axiomatic form. Moreover, partition logic is still in its infancy today, not to mention the over half 
a century ago when Shannon's theory was published|f| But starting with the suggested semantics 
for partition logic (i.e., the subset-to-partition and element-to-distinction analogies), we develop the 
partition analogue ( "counting distinctions" ) of the beginnings of finite probability theory ( "counting 
outcomes"), and then we show how it is related to the already-developed information theory of 
Shannon. It is in that sense that the developments in the paper provide a logical or conceptual 
foundation ("foundation" in the sense of a basic conceptual starting point) for information theory^ 

The following table sets out some of the analogies in a concise form (where the diagonal inU xU 
is Au = {{u,u) \u G U}). 



^Intuitively we might think of an element of a set as an "it." We will argue that a distinction or "dit" is the 
corresponding logical atom of information. In economics, there is a basic distinction between rivalrous goods (where 
more for one means less for another) such a material things ("its") in contrast to non-rivalrous goods (where what 
one person acquires does not take away from another) such as ideas, knowledge, and information ("bits" or "dits"). 
In that spirit, an element of a set represents a material thing, an "it," while the dual notion of a distinction or "dit" 
represents the immaterial notion of two "its" being distinct. The distinction between u and u' is the fact that u u' , 
not a new "thing" or "it." But for mathematical purposes we may represent a distinction by a pair of distinct elements 
such as the ordered pair {u,u') which is a higher level "it," i.e., an element in the Cartesian product of a set with 
itself (see next section). 

®For instance, the conceptual beginnings of probability theory in subset logic is shown by the role of Boolean 
algebras in probability theory, but what is the corresponding algebra for partition logic? 

^Perhaps an analogy will be helpful. It is as if the axioms for probability theory had first emerged full-blown from 
Kolmogorov 1211 and then one realized belatedly that the discipline could be seen as growing out of the starting point 
of operations on subsets of a finite space of outcomes where the logic was the logic of subsets. 
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I able oj Analogies 


iSubsets 


Partitions 


"Atoms" 


Elements 


Distinctions 


All atoms 


Universe U (all u G U) — 1 


Discrete partition 1 (all dits) 


No atoms 


iNun set V \ Yio 7i t u j — u 


Indiscrete partition (no dits) 


Model of proposition or event 


Subset SQU 


Partition tt on [/ 


Model of individual or outcome 


Element w in [/ 


Distinction {u, u') in [/ x U — Ajy 


Prop, holds or event occurs 


Element u in subset S 


Partition tt distinguishes (u, u') 


Lattice of propositions/events 


Lattice of all subsets V (U) 


Lattice of all partitions 11 ([/) 


Counting measure {U finite) 


# elements in S 


# dits (as ordered pairs) in tt 


Normalized count (U finite) 


Prob(5) - * ^'"Yuf" '" 


U f \ ^distinctions in tt 

i^y^)- \uxu\ 


Prob. Interpretation ([/ finite) 


Prob (S) = probability that 
random element u is in S 


h (tt) = probability random pair 
(w, u') is distinguished by tt 



These analogies show one set of reasons why the lattice of partitions 11 {U) should be written 
with the discrete partition as the top element and the indiscrete partition (blob) as the bottom 
element of the lattice — in spite of the usual convention of writing the "refinement" ordering the 
other way around as what Gian-Carlo Rota called the "unrefinement ordering." 

With this motivation, we turn to the development of this conceptual basis for information theory. 



2 Logical Information Theory 

2.1 The Closure Space U xU 

Claude Shannon's classic 1948 articles developed a statistical theory of communications that is 
ordinarily called "information theory." Shannon built upon the work of Ralph Hartley [15j twenty 
years earlier. After Shannon's information theory was presented axiomatically, there was a spate of 
new definitions of "entropy" with various axiomatic properties but without concrete (never mind 
logical) interpretations [20j . Here we take the approach of starting with a notion that arises naturally 
in the logic of partitions, dual to the usual logic of subsets. The notion of a distinction or "dit" is 
taken as the logical atom of information and a "logical information theory" is developed based on that 
interpretation. When the universe set U is finite, then we have a numerical notion of "information" 
or "entropy" h (tt) of a partition tt in the number of distinctions normalized by the number of ordered 
pairs. This logical "counting distinctions" notion of information or entropy can then be related to 
Shannon's measure of information or entropy. 

The basic conceptual unit in logical information theory is the distinction or dit (from "DIsTinc- 
tion" but motivated by "bit"). A pair {u,u') of distinct elements of U are distinguished by tt, i.e., 
form a dit of tt, if u and u' are in different blocks of 7rlf| A pair (u, u') are identified by tt and form 
an indit (from INDIsTinction or "identification" ) of the partition if they are contained in the same 
block of TT. A partition on U can be characterized by either its dits or indits (just as a subset S of 
U can be characterized by the elements added to the null set to arrive at 5' or by the elements of U 
thrown out to arrive at 5*). When a partition tt is thought of as determining an equivalence relation, 
then the equivalence relation, as a set of ordered pairs contained va U x U = U"^ , \s the indit set 
indit (tt) of indits of the partition. But from the view point of logical information theory, the focus 
is on the distinctions, so the partition tt qua binary relation is given by the complementary dit set 
dit (tt) of dits where dit (tt) = {U x U) — indit (tt) = indit (tt)*^. Rather than think of the partition as 
resulting from identifications made to the elements of U (i.e., distinctions excluded from the discrete 
partition), we think of it as being formed by making distinctions starting with the blob. This is 

^Onc might also develop the theory using unordered pairs {u,u'} but the later development of the theory using 
probabilistic methods is much facilitated by using ordered pairs {u,u'). Thus for u ^ u' , (u,u') and {u',u) count as 
two distinctions. This means that the count of distinctions in a partition must be normalized by \U X U\. Note that 
U X U includes the diagonal self-pairs {u, u) which can never be distinctions. 
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analogous to a subset S being thought of as the set of elements that must be added to the null set 
to obtain S rather than the complementary approach to S by giving the elements excluded from 
U to arrive at S. From this viewpoint, the natural ordering cr ^ tt of partitions would be given by 
the inclusion ordering of dit-sets dit (cr) C dit (vr) and that is exactly the new way of writing the 
refinement relation that we are using, i.e., 

CT ^ TT iff dit (cr) C dit (tt). 

There is a natural ( "built-in" ) closure operation on U x U so that the equivalence relations on 
U are given (as binary relations) by the closed sets. A subset C C C/^ is closed if it contains the 
diagonal {{u,u) \ u £ U}, if {u,u') G C imphes [u' ,u) € C, and if {u,u') and {u',u") are in C, then 
(it, u") is in C. Thus the closed sets of are the reflexive, symmetric, and transitive relations, i.e., 
the equivalence relations on U. The intersection of closed sets is closed and the intersection of all 
closed sets containing a subset S C is the closure S of S. 

It should be carefully noted that the closure operation on the closure space is not a topological 
closure operation in the sense that the union of two closed set is not necessarily closed. In spite of 
the closure operation not being topological, we may still refer to the complements of closed sets as 
being open sets, i.e., the dit sets of partitions on U. As usual, the interior int(S') of any subset S is 
defined as the complement of the closure of its complement: int(S') — (S"^) . 

The open sets oi U x U ordered by inclusion form a lattice isomorphic to the lattice Il{U) of 
partitions on U. The closed sets oiU xU ordered by inclusion form a lattice isomorphic to n([/)°P, 
the opposite of the lattice of partitions on U (formed by turning around the partial order). The 
motivation for writing the refinement relation in the old way was probably that equivalence relations 
were thought of as binary relations indit (tt) C [/ x C/, so the ordering of equivalence relations was 
written to reflect the inclusion ordering between indit-sets. But since a partition and an equivalence 
relation were then taken as essentially the "same thing," i.e., a set {B} of mutually exclusive 
and jointly exhaustive subsets ("blocks" or "equivalence classes") of [/, that way of writing the 
ordering carried over to partitions. But we identify a partition n as a binary relation with its dit-set 
dit (tt) — U X U — indit (tt) so our refinement ordering is the inclusion ordering between dit-sets (the 
opposite of the inclusion ordering of indit-sets) H 

Given two partitions tt and a onU , the open set corresponding to the join tt Vcr of the partitions 
is the partition whose dit-set is the union of their dit-sets 

dit(7r V cr) = dit (tt) U dit (cr). 

The open set corresponding to the meet tt A cr of partitions is the interior of the intersection of their 
dit-setsllll 

dit(7r A cr) = int (dit (tt) n dit (cr)). 

The open set corresponding to the bottom or blob is the null set C J7 x [/ (no distinctions) 
and the open set corresponding to the discrete partition or top 1 is the complement of the diagonal, 
U X U — Ajj (all distinctions). 

^One way to establish the duahty between elements of subsets and distinctions in a partition is to start with the 
refinement relation as the partial order in the lattice of partitions Ti-{U) analogous to the inclusion partial order in 

the lattice of subsets 'P{U). Then the mapping tt i > dit{7r) represents the lattice of partitions as the lattice of open 

subsets of the closure space U xU with inclusion as the partial order. Then the analogue of the elements in the subsets 
of 'P{U) would be the elements in the subsets dit (vr) representing the partitions, namely, the distinctions. 

^"Note that this union of dit sets gives the dit set of the "meet" in the old reversed way of writing the refinement 
ordering. 

'^^Note that this is the "join" in the old reversed way of writing the refinement ordering. This operation defined by the 
interior operator of the non-topological closure operation leads to "anomolous" results such as the non-distributivity 
of the partition lattice — in contrast to the distributivity of the lattice of open sets of a topological space. 
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2.2 Some Set Structure Theorems 



Before restricting ourselves to finite U to use the counting measure |dit (7r)|, there are a few structure 
theorems that are independent of cardinality. If the "atom" of information is the dit then the atomic 
information in a partition tt "is" its dit set, dit(7r). The information common to two partitions tt 
and (7, their mutual information set, would naturally be the intersection of their dit sets (which is 
not necessarily the dit set of a partition): 

Mut(7r, cr) = dit (vr) n dit (ct). 

Shannon deliberately defined his measure of information so that it would be "additive" in the sense 
that the measure of information in two independent probability distributions would be the sum of the 
information measures of the two separate distributions and there would be zero mutual information 
between the independent distributions. But this is not true at the logical level with information 
defined as distinctions. There is always mutual information between two non-blob partitions — even 
though the interior of Mut (tt, cr) might be empty, i.e., int (Mut(7r, cr)) = int (dit (vr) n dit (cr)) = 
dit (tt a cr) might be empty so that tt Aa — 0. 

Proposition 1 Given two partitions tt and a on U with TT ^O^a, Mut (tt, cr) ^ 00 

Since tt is not the blob, consider two elements u and u' distinguished by tt but identified by a 
[otherwise (u, u') € Mut(7r, ct)]. Since a is also not the blob, there must be a third element u" not in 
the same block of cr as m and u' . But since u and u' are in different blocks of tt, the third element 
u" must be distinguished from one or the other or both in tt. Hence {u,u") or {u',u") must be 
distinguished by both partitions and thus must be in their mutual information set Mut {tt, a)M {— 
end of proof marker) 

The closed and open subsets of can be characterized using the usual notions of blocks of a 
partition. Given a partition tt on J7 as a set of blocks tt — {B}^^^, let B x B' he the Cartesian 
product of B and B' . Then 



indit (vr) = IJ ^ x B 

Be-rr 

dit (vr) = U BxB' = UxU- indit (vr) = indit (vr)'' . 

B^B' 
B,B'eiT 

The mutual information set can also be characterized in this manner. 
Proposition 2 Given partitions tt and cr with blocks {B}g^^ and {C}^^-^, then 

Mut(vr,(T)= U {B-{BnC))x{C-{BnC))= [j {B-C)x{C ~B). 

BeTV.Cea- Beir.Cea- 

The union (which is a disjoint union) will include the pairs (u, u') where for some i? e vr and C e cr, 
u e B - {B nC) and u' £ C ~ (B nC). Since u' is in C but not in the intersection B n C, it 
must be in a different block of vr than B so {u,u') G dit(vr). Symmetrically, {u,u') G dit (cr) so 
{u, u') G Mut (tf, a) = dit (vr) n dit (cr). Conversely if {u, u') € Mut (vr, a) then take the B containing 
u and the C containing u'. Since (u,m') is distinguished by both partitions, u ^ C and u' ^ B so 
that {u, u') e{B-{Bn C)) X (C - (B n C)).U 

^■^The contrapositive of this proposition is interesting. Given two equivalence relations Ei, E2 C C/^, if every pair 
of elements u,u' G U is identified by one or the other of the relations, i.e., Ei U E2 = U"^ , then either Ei = or 
E2 = 
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2.3 Logical Information Theory on Finite Sets 

For a finite set U, the (normalized) "counting distinctions" measure of information can be defined 
and compared to Sliannon's measure for finite probability distributions. Since the information set 
of a partition tt on U is its set of distinctions dit (tt) , the un- normalized numerical measure of the 
information of a partition is simply the count of that set, |dit (7r)| ("dit count"). But to account for 
the total number of ordered pairs of elements from U, we normalize hy \U x U\ = \Uf to obtain the 
logical information content or logical entropy of a partition tt as its normalized dit count: 



y^) - \uxu\ 



Probability theory started with the finite case where there was a finite set U of possibilities 
(the finite sample space) and an event was a subset C [/. Under the Laplacian assumption that 
each outcome was equiprobable, the probability of the event S was the similar normalized counting 
measure of the set: 

Prob(5)= 1^. 

This is the probability that any randomly chosen element of U is an element of the subset S. In view of 
the dual relationship between being in a subset and being distinguished by a partition, the analogous 
concept would be the probability that an ordered pair (u, u') of elements of U chosen independently 
(i.e., with replacemenio) would be distinguished by a partition tt, and that is precisely the logical 
entropy hijT) = |dit (7r)| / \U x U\ (since each pair randomly chosen from U x U is equiprobable). 



Probabilistic interpretation: h (tt) = probability a random pair is distinguished by 



In finite probability theory, when a point is sampled from the sample space U, we say the event 
5" occurs if the point u was an element in S" C [/. When a random pair (u, u') is sampled from 
the sample space U x U, we say the partition tt distinguished^ if the pair is distinguished by the 
partition, i.e., if {u, u') e dit (tt) <ZU xU . Then just as we take Prob (5) as the probability that the 
event S occurs, so the logical entropy h (tt) is the probability that the partition tt distinguishes. 
Since dit (tt V a) = dit (tt) U dit (ct). 



probability that tt W a distinguishes — h{TT y a) = probability that vr or cr distinguishes 



The probability that a randomly chosen pair would be distinguished by tt and a would be given 
by the relative cardinality of the mutual information set which is called the mutual information of 
the partitions: 



Mutual logical information: ■m{TT, a) = ^'^"^1^'°^'' — probability that tt and cr distinguishes 



Since the cardinality of intersections of sets can be analyzed using the inclusion-exclusion prin- 
ciple, we have: 

|Mut (7r,cr)| = |dit (tt) n dit {a) \ = |dit (tt)] + |dit (cr)| - |dit (tt) U dit (cr)|. 

Normalizing, the probability that a random pair is distinguished by both partitions is given by the 
modular law: 



^^Drawing with replacement would allow diagonal pairs {u,u) to be drawn and requires \U X U\ as the normalizing 
factor. 

'^^Equivalent terminology would bo "differentiates" or "discriminates." 
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m (tt, a) ^ |d'tWndit(.)| ^ ^ _ |ditWudit(.)[ ^ + h{a)- h V a). 

This can be extended by the inclusion-exclusion principle to any number of partitions. The mutual 
information set Mut (tt, a) is not the dit-set of a partition but its interior is the dit-set of the meet 
so the logical entropies of the join and meet satisfy the: 

Submodular inequality: (tt A cr) + ft. (vr V cr) <h (vr) + h (cr). 

2.4 Using General Finite Probability Distributions 

Since the logical entropy of a partition on a finite set can be given a simple probabilistic interpreta- 
tion, it is not surprising that many methods of probability theory can be harnessed to develop the 
theory. The theory for the finite case can be developed at two different levels of generality, using 
the specific Laplacian equiprobability distribution on the finite set U or using an arbitrary finite 
probability distribution. Correctly formulated, all the formulas concerning logical entropy and the 
related concepts will work for the general case, but our purpose is not mathematical generality. Our 
purpose is to give the basic motivating example of logical entropy based on "counting distinctions" 
and to show its relationship to Shannon's notion of entropy, thereby clarifying the logical foundations 
of the latter concept. 

Every probability distribution on a finite set U gives a probability ps for each block B in 
a partition tt but for the Laplacian distribution, it is just the relative cardinality of the block: 
Pb = 1^ for blocks i? G tt. Since there are no empty blocks, ps > and YliBeirPB = 1- Since the dit 
set of a partition is dit (tt) = IJ ^ x its size is |dit (tt)] = Y.b^B' \B\ \B'\ = Ebstt I-^I I^^" ^I- 

Thus the logical information or entropy in a partition as the normalized size of the dit set can be 
developed as follows: 



B^B' Be-n Be-rr 



Having defined and interpreted logical entropy in terms of the distinctions of a set partition, 
we may, if desired, "kick away the ladder" and define the logical entropy of any finite probability 
distribution p = {pi, as: 



^ (P) = Er=i Pi (1 - k) = 1 - ELi Pl • 



The probabilistic interpretation is that h (p) is the probability that two independent draws (from 
the sample space of n points with these probabilities) will give distinct pointsP^I 

2.5 A Brief History of the Logical Entropy Formula: h(p) = 1 — J2iPi 

The logical entropy formula h{p) = 1 — X^i motivated as the normalized count of the dis- 

tinctions made by a partition, |dit (7r)| / |J7|^, when the probabilities are the block probabilities 
Pq — 1^ of a partition on a set U (under a Laplacian assumption). The complementary measure 
1 ~ h{p) — J2i Pi would be motivated as the normalized count of the identifications made by a 
partition, |indit (7r)| / thought of as an equivalence relation. Thus 1 — J^iPh motivated by 
distinctions, is a measure of heterogeneity or diversity, while the complementary measure X^iPf' 
motivated by identifications, is a measure of homogeneity or concentration. Historically, the formula 

'^^Note that we can always rephrase in terms of partitions by taking h (p) as the entropy h ^1^ of discrete partition 
on U = {ui, ...,Un} with the pi's as the probabiUties of the singleton blocks {ui} of the discrete partition. 
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can be found in either form depending on the particular context. The pi's might be relative shares 
such as the relative share of organisms of the i*'' species in some population of organisms, and then 
the interpretation of pi as a probability arises by considering the random choice of an organism from 
the population. 

According to I. J. Good, the formula has a certain naturalness: "If pi, ■■■,pt are the probabilities 
of t mutually exclusive and exhaustive events, any statistician of this century who wanted a measure 
of homogeneity would have take about two seconds to suggest J^Pi which I shall call p." [13l p. 561] 
As noted by Bhargava and Uppuluri [4], the formula i — J2Pi used by Gini in 1912 ([TU] reprinted 
in [111 p. 369]) as a measure of "mutability" or diversity. But another development of the formula 
(in the complementary form) in the early twentieth century was in cryptography. The American 
cryptologist, William F. Friedman, devoted a 1922 book ([^) to the "index of coincidence" (i.e., 
J2Pi)- Solomon KuUback (see the KuUback-Leibler divergence treated later) worked as an assistant 
to Friedman and wrote a book on cryptology which used the index. 

During World War II, Alan M. Turing worked for a time in the Government Code and Cypher 
School at the Bletchley Park facility in England. Probably unaware of the earlier work, Turing used 
p = ^ in his cryptoanalysis work and called it the repeat rate since it is the probability of a repeat 
in a pair of independent draws from a population with those probabilities (i.e., the identification 
probability 1 — h{p)). Polish cryptoanalyists had independently used the repeat rate in their work 
on the Enigma [27] . 

After the war, Edward H. Simpson, a British statistician, proposed X^sex P% ^ measure 
of species concentration (the opposite of diversity) where tt is the partition of animals or plants 
according to species and where each animal or plant is considered as equiprobable. And Simpson 
gave the interpretation of this homogeneity measure as "the probability that two individuals chosen 
at random and independently from the population will be found to belong to the same group." |33[ p. 
688] Hence 1 — X^bgtt Pb is the probability that a random ordered pair will belong to different species, 
i.e., will be distinguished by the species partition. In the biodiversity literature [31j . the formula is 
known as "Simpson's index of diversity" or sometimes, the "Gini-Simpson diversity index." However, 
Simpson along with I. J. Good worked at Bletchley during WWII, and, according to Good, "E. H. 
Simpson and I both obtained the notion [the repeat rate] from Turing." [T^l p. 395] When Simpson 
published the index in 1948, he (again, according to Good) did not acknowledge Turing "fearing 
that to acknowledge him would be regarded as a breach of security." [13l p. 562] 

In 1945, Albert O. Hirschman ([TH p. 159] and [l9j) suggested using y/J2Pi index of 

trade concentration (where Pi is the relative share of trade in a certain commodity or with a certain 
partner). A few years later. Orris Herfindahl [T7] independently suggested using index 
of industrial concentration (where Pi is the relative share of the i*^ firm in an industry). In the 
industrial economics literature, the index H = ^pf is variously called the Hirschman-Herfindahl 
index, the HH index, or just the H index of concentration. If all the relative shares were equal (i.e.. 
Pi = 1/ri), then the identification or repeat probability is just the probability of drawing any element, 
i.e., H = 1/n, so = rt is the number of equal elements. This led to the "numbers equivalent" 
interpretation of the reciprocal of the H index [5]. In general, given an event with probability po j the 
"numbers-equivalent" interpretation of the event is that it is 'as if an element was drawn out of a set 
of ^ equiprobable elements (it is 'as if since 1/po need not be an integer). This numbers-equivalent 
idea is related to the "block-count" notion of entropy defined later. 

In view of the frequent and independent discovery and rediscovery of the formula p — ^pf or 
its complement 1 — X^pf by Gini, Friedman, Turing, Hirschman, Herfindahl, and no doubt others, 
I. J. Good wisely advises that "it is unjust to associate p with any one person." [T31 p. 562J3 

After Shannon's axiomatic introduction of his entropy |32] . there was a proliferation of axiomatic 
entropies with a variable parameter^ The formula 1 — X^pf for logical entropy appeared as a 

^^The name "logical entropy" for 1 — X) Pi 'i"* only denotes the basic status of the formula, it avoids "Stigler's Law 
of Eponymy" : "No scientific discovery is named after its original discoverer." 1341 p. 277] 

^^There was no need for Shannon to present his entropy concept axiomatically since it was based on a standard 
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special case for a specific parameter value in several cases. During the 1960's, Aczel and Daroczy [T] 
developed the generalized entropies of degree a: 

77«(pi,...,p„) = 

and the logical entropy occurred as half the value for a — 2. That formula also appeared as Havrda- 
Charvat's structural a-entropy jl6| : 

5'(pi,...,p„,;a) = 2°°i-i (1 ~ J2iPi) 

and the special case of a = 2 was considered by Vajda [5^ . 

Patil and Taillie |25| defined the diversity index of degree (3 in 1982: 



and Tsallis [35] independently gave the same formula as an entropy formula in 1988: 

where the logical entropy formula occurs as a special case {(3 — 1 oy q — 2). While the generalized 
parametric entropies may be interesting as axiomatic exercises, our purpose is to emphasize the 
specific logical interpretation of the logical entropy formula (or its complement) . 

From the logical viewpoint, two elements from U — {mi,...,u„} are either identical or distinct. 
Gini lOj introduced dij as the "distance" between the and elements where dij = 1 for i ^ j 
and da = 0. Since 1 = (pi + ... + Pn) {pi + ■■■ + Pn) ^ JliPf + J2i^jPiPj^ the logical entropy, i.e., 
Gini's index of mutability, h{p) = 1 — J^iPi ~ PiPj' the average logical distance between a 
pair of independently drawn elements. But one might generalize by allowing other distances dij = dji 
for i ^ j (but always da = 0) so that Q = '^i^j dtjPiPj would be the average distance between 
a pair of independently drawn elements from U. In 1982, C. R. (Calyampudi Radhakrishna) Rao 
introduced precisely this concept as quadratic entropy [26] (which was later rediscovered in the 
biodiversity literature as the "Avalanche Index" by Ganeshaish et al. [9]). In many domains, it is 
quite reasonable to move beyond the bare-bones logical distance of dij = 1 for i ^ j so that Rao's 
quadratic entropy is a useful and easily interpreted generalization of logical entropy. 



3 Relationship between the Logical and Shannon Entropies 

3.1 The Search Approach to Find the "Sent Message" 

The logical entropy h (tt) = X^SG-n-^*^ ^ P^) this form as an average over blocks allows a direct 
comparison with Shannon's entropy H (tt) = X^seir I'^S^l^) '^^ the partition which is also an 
average over the blocks. What is the connection between the block entropies h{B) — 1 — pb and 
H (B) = log2 (^^"^ Shannon uses reasoning (shared with Hartley) to arrive at a notion of entropy 
or information content for an element out of a subset (e.g., a block in a partition as a set of blocks). 
Then for a partition tt, Shannon averages the block values to get the partition value H (tt). Hartley 
and Shannon start with the question of the information required to single an element u out of 
a set U, e.g., to single out the sent message from the set of possible messages. Alfred Renyi has 

concrete interpretation (expected number of binary partitions needed to distinguish a designated element) which could 
then be generalized. The axiomatic development encouraged the presentation of other "entropies" as if the axioms 
eliminated or, at least, relaxed any need for an interpretation of the "entropy" concept. 
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also emphasized this "search-theoretic" approach to information theory (see [55] , [5^ , or numerous 
papers in [3D])[!fl 

One intuitive measure of the information obtained by determining the designated element in 
a set U of equiprobable elements would just be the cardinality \U\ of the set, and, as we will see, 
that leads to a multiplicative "block-count" version of Shannon's entropy. But Hartley and Shannon 
wanted the additivity that comes from taking the logarithm of the set size \U\. If \U\ = 2" then 
this allows the crucial Shannon interpretation of log2 i\U\) = n as the minimum number of yes-or-no 
questions (binary partitions) it takes to single out any designated element (the "sent message") of 
the set. In a mathematical version of the game of twenty questions (like Renyi's Hungarian game 
of "Bar-Kochba" ) , think of each element of U as being assigned a unique binary number with n 
digits. Then the minimum n questions can just be the questions asking for the i*^ binary digit of 
the hidden designated element. Each answer gives one hit (short for "binary digit") of information. 
With this motivation for the case of |C/| = 2", Shannon and Hartley take log (|C/|) as the measure of 
the information required to single out a hidden element in a set with \U\ equiprobable elements 
That extends the "minimum number of yes-or-no questions" motivation from \U\ = 2" to any finite 
set U with \U\ equiprobable elements. If a partition tt had equiprobable blocks, then the Shannon 
entropy would be H (B) = log (|7r|) where |7r| is the number of blocks. 

To extend this basic idea to sets of elements which are not equiprobable (e.g., partitions with 
unequal blocks), it is useful to use an old device to restate any positive probability as a chance among 
equiprobable elements. If pi = 0.02, then there is a 1 in 50 = ^ chance of the z*'* outcome occurring 
in any trial. It is "as if the outcome was one among l/pi equiprobable outcomesl^ Thus each 
positive probability pi has an associated equivalent number 1 /pi which is the size of the hypothetical 
set of equiprobable elements so that the probability of drawing any given element is pi 

Given a partition {B}^^^ with unequal blocks, we motivate the block entropy H (B) for a block 
with probability by taking it as the entropy for a hypothetical numbers-equivalent partition ttb 
with — equiprobable blocks, i.e.. 



With this motivation, the Shannon entropy of the partition is then defined as the arithmetical 
average of the block entropies: 



This can be directly compared to the logical entropy ft. (vr) ~ TliBe-nPB (1 ~ Pb) which arose 
from quite different distinction-based reasoning (e.g., where the search of a single designated element 
played no role). Nevertheless, the formula X^bgttPs (1 ~ Pb) can be viewed as an average over the 
quantities which play the role of "block entropies" h (B) — (1 ~ pb)- But this "block entropy" cannot 
be directly interpreted as a (normalized) dit count since there is no such thing as the dit count for 
a single block. The dits are the pairs of elements in distinct blocks. 

'^^In Gian-Carlo Rota's teaching, he supposed that the Devil had picked an element out of U and would not reveal 
its identity. But when given a binary partition (i.e., a yes-or-no question), the Devil had to truthfully tell which block 
contained the hidden element. Hence the problem was to find the minimum number of binary partitions needed to 
force the Devil to reveal the hidden element. 

^^Ifartley used logs to the base 10 but here all logs are to base 2 unless otherwise indicated. Instead of considering 
whether the base should be 2, 10, or e, it is perhaps more important to see that there is a natural base-free variation 
Hm (i") on Shannon's entropy (see "block-count entropy" defined below). 

^''Since 1/pi need not be an integer (or even rational), one could interpret the equiprobable "number of elements" 
as being heuristic or one could restate it in continuous terms. The continuous version is the uniform distribution on 
the real interval [0, l/pi] where the probability of an outcome in the unit interval [0, 1] is 1/ (1/pi) = Pi- 

^^In continuous terms, the numbers- equivalent is the length of the interval [0, l/pi] with the uniform distribution 
on it. 



PB 



i/(i?)-log(kB|)=log(^) 
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For comparison purposes, we may nevertheless carry over the heuristic reasoning to the case 
of logical entropy. For each block B, we take the same hypothetical numbers-equivalent partition 
TTs with 1^ = ^ equal blocks of size \B\ and then take the desired block entropy h {B) as the 
normalized dit count h (ttb) for that partition. Each block contributes (1 — }5b) to the normalized 
dit count and there arc j[/|/|_B| = 1/pb blocks in ttb so the total normalized dit count simplifies 
to: h{'KB) = ^Pb (1 — Pb) = ^ — Pb = h{B), which we could take as the logical block entropy. 
Then the average of these logical block entropies gives the logical entropy h {it) = J^BeTrPsf^ i^) = 
^b^ttPb — Pb) of the partition it, all in the manner of the heuristic development of Shannon's 



There is, however, no need to go through this reasoning to arrive at the logical entropy of a 
partition as the average of block entropies. The interpretation of the logical entropy as the normalized 
dit count survives the averaging even though all the blocks of tt might have different sizes, i.e., the 
interpretation "commutes" with the averaging of block entropies. Thus h (tt) is the actual dit count 
(normalized) for a partition tt, not just the average of block entropies h (B) that could be interpreted 
as the normalized dit counts for hypothetical partitions ttb- 

The interpretation of the Shannon measure of information as the minimum number of binary 
questions it takes to single out a designated block does not commute with the averaging over the 
set of different-sized blocks in a partition. Hence the Shannon entropy of a partition is the expected 
number of bits it takes to single out the designated block while the logical entropy of a partition on 
a set is the actual number of dits (normalized) distinguished by the partition. 

The last step in connecting Shannon entropy and logical entropy is to rephrase the heuristics 
behind Shannon entropy in terms of "making all the distinctions" rather than "singling out the 
designated element." 

3.2 Distinction-based Treatment of Shannon's Entropy 

The search-theoretic approach was the heritage of the original application of information theory to 
communications where the focus was on singling out a designated element, the sent message. In 
the "twenty questions" version, one person picks a hidden element and the other person seeks the 
minimum number of binary partitions on the set of possible answers to single out the answer. But 
it is simple to see that the focus on the single designated element was unnecessary. The essential 
point was to make all the distinctions to separate the elements-since any element could have been 
the designated one. If the join of the minimum number of binary partitions did not distinguish 
all the elements into singleton blocks, then one could not have picked out the hidden element if it 
was in a non-singleton block. Hence the distinction-based treatment of Shannon's entropy amounts 
to rephrasing the above heuristic argument in terms of "making all the distinctions" rather than 
"making the distinctions necessary to single out any designated element." 

In the basic example of \U\ = 2'^ where we may think of the 2" like or equiprobable elements 

as being encoded with n binary digit numbers, then n = log ^y/W) minimum number of 

binary partitions (each partitioning according to one of the n digits) necessary to make all the 
distinctions between the elements, i.e., the minimum number of binary partitions whose join is the 
discrete partition with singleton blocks (each block probability being pb = 1/2"). Generalizing to 
any set U of equiprobable elements, the minimum number of bits necessary to distinguish all the 

elements from each other is log f lyjjjjl = log (|C^|)- Given a partition tt = {B}g^^ on U, the block 



entropy H [B) = log is the minimum number of bits necessary to distinguish all the blocks 

in the numbers-equivalent partition ttb, and the average of those block entropies gives the Shannon 



The point of rephrasing the heuristics behind Shannon's definition of entropy in terms of the 



i?W=EBe.PBlog(^). 
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average bits needed to "make all the distinctions" is that it can then be directly compared with the 
logical definition of entropy which is simply the total number of distinctions normalized by \U\^ . 
Thus the two definitions of entropy boil down to two different ways of measuring the totality of 
distinctions. A third way to measure the totality of distinctions, called the "block-count entropy," 
is defined below. Hence we have our overall theme that these three notions of entropy boil down to 
three ways of "counting distinctions." 

3.3 Relationships Between the Block Entropies 

Since the logical and Shannon entropies have formulas presenting them as averages of block-entropies, 
h (^) = Y^BeTvPB (1 - Pb) and H (tt) = Y^bgtvPb log (^^j , the two notions are precisely related by 

their respective block entropies, h (B) = 1 — pb and H (B) = log (^■^^ ■ Solving each for pb and 
then eliminating it yields the; 



Block entropy relationship: h {B) = 1 — ^h(b) and H (B) = log (^jh^b)) 



The block entropy relation, h{B) = 1 — ^h\b) j has a simple probabilistic interpretation. Thinking 
of H (B) as an integer, H (B) is the Shannon entropy of the discrete partition on U with \U\ — 2^^^^ 
elements while h{B) = 1 — ^ms) = 1 — Ps is the logical entropy of that partition since 1/2-^'-^^ 
is the probability of each block in that discrete partition. The probability that a random pair is 
distinguished by a discrete partition is just the probability that the second draw is distinct from the 
first draw. Given the first draw from a set of 2^^^' individuals, the probability that the second draw 
(with replacement) is different is 1 — 2«TbT ~ ^ i-^)- 

To summarize the comparison up to this point, the logical theory and Shannon's theory start by 
posing different questions which then turn out to be precisely related. Shannon's statistical theory 
of communications is concerned with determining the sent message out of a set of possible messages. 
In the basic case, the messages are equiprobable so it is abstractly the problem of determining the 
hidden designated element out of a set of equiprobable elements which, for simplicity, we can assume 
has 2" elements. The process of determining the hidden element can be conceptualized as the process 
of asking binary questions which split the set of possibilities into equiprobable parts. The answer to 
the first question determines which subset of 2"~^ elements contains the hidden element and that 
provides 1 bit of information. An independent equal-blocked binary partition would split each of the 
2"~^ element blocks into equal blocks with 2"~^ elements each. Thus 2 bits of information would 
determine which of those 2^ blocks contained the hidden element, and so forth. Thus n independent 
equal-blocked binary partitions would determine which of the resulting 2" blocks contains the hidden 
element. Since there arc 2" elements, each of those blocks is a singleton so the hidden element has 
been determined. Hence the problem of finding a designated element among 2" equiprobable elements 
requires log (2") = n bits of information. 

The logical theory starts with the basic notion of a distinction between elements and defines 
the logical information in a set of distinct 2" elements as the (normalized) number of distinctions 
that need to be made to distinguish the 2" elements. The distinctions are counted as ordered rather 
than unordered pairs (in order to better apply the machinery of probability theory) and the number 
of distinctions or dits is normalized by the number of all ordered pairs. Hence a set of 2" distinct 
elements would involve |f/ x - Ac/| = 2" x 2" - 2" = 22" - 2" = 2" (2" - 1) distinctions which 
normalizes to = 1 — 

There is, however, no need to motivate Shannon's entropy by focusing on the search for a 
designated element. The task can equivalently be taken as distinguishing all elements from each 
other rather than distinguishing a designated element from all the other elements. The connection 
between the two approaches can be seen by computing the total number of distinctions made by 
intersecting the n independent equal-blocked binary partitions in Shannon's approach. 
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Example of counting distinctions: Doing the computation, the first partition which creates two 
sets of 2"~^ elements each thereby creates 2"~^ x 2"~^ = 2^"~^ distinctions as unordered pairs 
and 2 x 2^"~^ = 2^"~^ distinctions as ordered pairs. The next binary partition splits each of 
those blocks into equal blocks of 2"~^ elements. Each split block creates 2""^ x 2"~^ = 2^"~^ 
new distinctions as unordered pairs and there were two such splits so there are 2 x 2^"""' = 2^"~^ 
additional unordered pairs of distinct elements created or 2^"~^ new ordered pair distinctions. 
In a similar manner, the third partition creates 2^""^ new dits and so forth down to the n*'' 
partition which adds 2^"~" new dits. Thus in total, the intersection of the n independent equal- 
blocked binary partitions has created 22"-i+22"-2+...+22"-" = 2" (2"-i + 2""^ + ... + 2°) = 

2" (^^Er) — 2" (2" ~ 1) (ordered pair) distinctions which are all the dits on a set with 2" 

elements. This is the instance of the block entropy relationship h [B) = 1 — ^sis) when the 

block B is a singleton in a 2" element set so that H {B) = log (i^rr) = log (2") = n and 

h{B) = l- = 1 - ^• 

Thus the Shannon entropy as the number of independent equal-blocked binary partitions it takes 
to single out a hidden designated element in a 2" element set is also the number of independent 
equal-blocked binary partitions it takes to distinguish all the elements of a 2" element set from each 
other. 

The connection between Shannon entropy and logical entropy boils down to two points. 

1. The first point is the basic fact that for binary partitions to single out a hidden element ("sent 
message") in a set is the same as the partitions distinguishing any pair of distinct elements 
(since if a pair was left undistinguished, the hidden clement could not be singled out if it 
were one of the elements in that undistinguished pair). This gives what might be called the 
distinction interpretation of Shannon entropy as a count of the binary partitions necessary 
to distinguish between all the distinct messages in the set of possible messages in contrast 
to the usual search interpretation as the binary partition count necessary to find the hidden 
designated element such as the sent message. 

2. The second point is that in addition to the Shannon count of the binary partitions necessary 
to make all the distinctions, we may use the logical measure that is simply the (normalized) 
count of the distinctions themselves. 

3.4 A Coin- Weighing Example 

The logic of the connection between joining independent equal-blocked partitions and efBciently 

creating dits is not dependent on the choice of base 2. Consider the coin- weighing problem where 
one has a balance scale and a set of 3" coins all of which look alike but one is counterfeit (the 
hidden designated element) and is lighter than the others. The coins might be numbered using the 
n-digit numbers in mod 3 arithmetic where the three digits are 0, 1, and 2. The n independent 
ternary partitions are arrived at by dividing the coins into three piles according to the i"' digit as 
i = 1, n. To use the n partitions to find the false coin, two of the piles are put on the balance scale. 
If one side is lighter, then the counterfeit coin is in that block. If the two sides balance, then the light 
coin is in the third block of coins not on the scale. Thus n weighings (i.e., the join of n independent 
equal-blocked ternary partitions) will determine the n ternary digits of the false coin, and thus the 

ternary Shannon entropy is logs (t^) ~ ^°S3 (3") = n trits. As before we can interpret the joining 

of independent partitions not only as the most efficient way to find the hidden element (e.g., the 
false coin or the sent message) but as the most efficient way to make all the distinctions between 
the elements of the set. 

The first partition (separating by the first ternary digit) creates 3 equal blocks of 3"~^ elements 
each so that creates 3 x 3"~^ x 3"~^ = 3^"^^ unordered pairs of distinct elements or 2 x 3^"~^ 
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ordered pair distinctions. The partition according to the second ternary digit divides each of these 
three blocks into three equal blocks of 3"~^ elements each so the additional unordered pairs created 
are 3 X 3 X 3"~^ x 3"~^ = 32"-2 or 2 x 32"-2 ordered pair distinctions. Continuing in this fashion, 
the n*^ ternary partition adds 2 x 3^"-" dits. Hence the total number of dits created by joining the 
n independent partitions is: 

2 X [3^"-! + 32"-2... + 3"] = 2 X [3" (3"-^ + 3""^... + l)] = 2 x [3"^^^1 = 3" (3" - 1) 

which is the total number of ordered pair distinctions between the elements of the 3" element 
set. Thus the Shannon measure in trits is the minimum number of ternary partitions needed to 
create all the distinctions between the elements of a set. The base-3 Shannon entropy is Hs (tt) = 

^b^ttPb logs (p^) '^hich for this example of the discrete partition on a 3" element set U is ^1^ = 

T^ueu ^ ^°S3 (t75") ~ ^°S3 (3") = n which can also be thought of as the block value entropy for a 
singleton block so that we may apply the block value relationship. The logical entropy of the discrete 
partition on this set is: /i ^1^ = ^ ^32^^^ 'S^ which could also be thought of as the block value 

of the logical entropy for a singleton block. Thus the entropies for the discrete partition stand in the 
block value relationship which for base 3 is: 

The example helps to show how the logical notion of a distinction underlies the Shannon measure 
of information, and how a complete procedure for finding the hidden element (e.g., the sent message) 
is equivalent to being able to make all the distinctions in a set of elements. But this should not be 
interpreted as showing that the Shannon's information theory "reduces" to the logical theory. The 
Shannon theory is addressing an additional question of finding the unknown element. One can have 
all the distinctions between elements, e.g., the assignment of distinct base-3 numbers to the 3" 
coins, without knowing which element is the designated one. Information theory becomes a theory 
of the transmission of information, i.e., a theory of communication, when that second question of 
"receiving the message" as to which element is the designated one is the focus of analysis. In the 
coin example, we might say that the information about the light coin was always there in the nature 
of the situation (i.e., taking "nature" as the sender) but was unknown to an observer (i.e., on the 
receiver side). The coin weighing scheme was a way for the observer to elicit the information out of 
the situation. Similarly, the game of twenty questions is about finding a way to uncover the hidden 
answer — which was all along distinct from the other possible answers (on the sender side). It is this 
question of the transmission of information (and the noise that might interfere with the process) that 
carries Shannon's statistical theory of communications well beyond the bare-bones logical analysis 
of information in terms of distinctions. 



3.5 Block-count Entropy 

The fact that the Shannon motivation works for other bases than 2 suggests that there might be 
a base- free version of the Shannon measure (the logical measure is already base- free). Sometimes 
the reciprocal of the probability of an event B is interpreted as the "surprise- value information" 
conveyed by the occurrence of B. But there is a better concept to use than the vague notion of 
"surprise- value information." For any positive probability po: we defined the reciprocal ^ as the 
equivalent number of (equiprobable) elements (always "as it were" since it need not be an integer) 
since that is the number of equiprobable elements in a set so that the probability of choosing any 
particular element is po- The "big surprise" as a small probability event occurs means it is "as 
if a particular element was picked from a big set of elements. For instance, for a block probability 
Pb = 1^, its numbers-equivalent is the number of blocks |^ = ^ in the hypothetical equal- blocked 
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partition ttb with each block equiprobable with B. Our task is to develop this number-of-blocks or 
block-count measure of information for partitions. 

The block-count block entropy H„i [B) is just the number of blocks in the hypothetical number- 
of-equivalent-blocks partition ttb where B is one of |^ = ^ associated similar blocks so that 
Hra (B) = ^. 

If events B and C were independent, then pBnc — PbPc so the equivalent number of elements 
associated with the occurrence of both events is the product — - — = — — of the number of elements 

^ PBnc PB PC 

associated with the separate events. This suggests that the average of the block entropies Hm (B) = 
^ should be the multiplicative average (or geometric mean) rather than the arithmetical average. 

Hence we define the number-of- equivalent blocks entropy or, in short, block-count entropy of a 
partition tt (which does not involve any choice of a base for logs) as the geometric mean of block 
entropies: 



Block-count entropy: i7,„ (vr) = J] (Bf = U [—) blocks . 



Finding the designated block in tt is the same on average as finding the designated block in a partition 
with (tt) equal blocks. But since Hm (tt) need not be an integer, one might take the reciprocal 
to obtain the probability interpretation: finding the designated block in tt is the same on average as 
the occurrence of an event with probability 1 jflra (tt) . 

Given a finite-valued random variable X with the values {xi, a;„} with the probabilities 
the additive expectation is: ii' [^] = X^T^iP^-^* ^^'^ multiplicative expectation is: 
Ejn [X] — n"=i ■ Treating the block probability as a random variable defined on the blocks of a 
partition, all three entropies can be expressed as expectations: 
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The usual (additive) Shannon entropy is then obtained as the log2 version of this "log-free" 
block-count entropy: 

log, {Hm in)) = log (^)'") = EBe. {{v^Y^) = T.Be.PB log (^) = H (^). 

Or viewed the other way around, Hm {n) ~ 2^''^'!^ The base 3 entropy encountered in the coin- 
weighing example is obtained by taking logs to that base: H^ (tt) = log3 {Hm {n)), and similarly for 
the Shannon entropy with natural logs: He (tt) = logg {Hm (tt)), or with common logs: i/io (tt) = 
logio {Hm{n)). 

Note that this relation Hm (tt) = 2^'^'^^ is a result, not a definition. The block-count entropy 
was defined from "scratch" in a manner similar to the usual Shannon entropy (which thus might 
be called the "log2-of-block-count entropy" or "binary-partition-count entropy"). In a partition of 
individual organisms by species, the interpretation of 2^''^^ (or e^='^' when natural logs are used) 
is the "number of equally common species" [Ml p. 514]. MacArthur argued that this block-count 

■^■^Thus we expect the number-of-blocks entropy to be multiplicative where the usual Shannon entropy is additive 
(e.g., for stochastically independent partitions) and hence the subscript on Hm (ti"). 
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entropy (where a block is a species) will "accord much more closely with our intuition..." (than the 
usual Shannon entropy). 

The block-count entropy is the information measure that takes the count of a set (of like ele- 
ments) as the measure of the information in the set. That is, for the discrete partition on [/, each ps 

is yjjj so the block-count entropy of the discrete partition is H,n = |?7|^^''^' — \U\ which could 
also be obtained as 2^(^) since H M ) = log(|[/|) is the logj-of-block-count Shannon entropy of 1. 



Hence, the natural choice of unit for the block-count entropy is "blocks" (as in i7,„ (^Ij = |J7| blocks 

in the discrete partition on U) . The block-count entropy of the discrete partition on an equiprobable 
3" element set is 3" blocks. Hence the Shannon entropy with base 3 would be the logg-of-block- 

count entropy: logg ^iJ„i (^1^) = logs (3") ~ ^'"i^^ ^^^^ coin-weighing example above. The 

block value relationship between the block-count entropy and the logical entropy in general is: 



where H„, (B) = l/ps = 2"^^) ^ ^h,{b) ^ ^hab) = lo^ioCS). 

4 Analogous Concepts for Shannon and Logical Entropies 
4.1 Independent Partitions 

It is sometimes asserted that "information" should be additive for independenll^ partitions but the 
underlying mathematical fact is that the block-count is multiplicative for independent partitions 
and Shannon chose to use the logarithm of the block-count as his measure of information. 

If two partitions tt = {B}^^^ and a — {C}(j^^ are independent, then the block counts (i.e., 
the block entropies for the block-count entropy) multiply, i.e., H„i{B n C) = 
Hm {B) Hjn {C). Hence for the multiplicative expectations we have: 



1 1 

Psnc Pb PC 



H^in V a) = Ub.c Hm {B n Cf^"" = IIb.c [Hm{B)Hm (C)]^"^^ = 
(rise. (BD iUce. H^^Cyc) = (^) (a), 

or taking logs to any desired base such as 2: 

H (tt V (t) = log2(i/™(7r V a)) = \og^ (iJ,„ (tt) (a)) = log2 (H™ (tt)) + log2 (H„, (a)) = 

H{7r)+H{a). 

Thus for independent partitions, the block-count entropies multiply and the log-of-block-count 
entropies add. What happens to the logical entropies? We have seen that when the information in 
a partition is represented by its dit set dit(7r), then the overlap in the dit sets of any two non- 
blob partitions is always non-empty. The dit set of the join of two partitions is just the union, 
dit(7r V cr) = dit (tt) U dit (cr), so that union is never a disjoint union (when the dit sets are non- 
empty) . We have used the motivation of thinking of a partition-as-dit-set dit (tt) as an "event" in 
a sample space U x U with the probability of that event being the logical entropy of the partition. 
The following proposition shows that this motivation extends to the notion of independence. 

Proposition 3 //tt and a are (stochastically) independent partitions, then their dit sets dit (tt) and 
dit (cr) are independent as events in the sample space U x U (with equiprobable points). 



Recall the "independent" means stochastic independence so that partitions tt and cr are independent if for all 
_B e TT and C € a, PBnC = PbPC- 
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For independent partitions tt and cr, we need to show that the probabihty m(7r, a) of the event 
Mut (tt, cr) = dit (tt) n dit (cr) is equal to the product of the probabihties h (tt) and h (cr) of the 
events dit (tt) and dit (cr) in the sample space U x U. By the assumption of independence, we have 
'^^1" — PBnc — PbPc — ■^^/p^ so that n C| = \B\ \C\ / \U\. By the previous structure theorem 
for the mutual information set: Mut (tt, cr) = IJ {B - {B n C)) x {C - {B n C)), where the 

union is disjoint so that: 



|Mut(^,a)|= J2 m-\Bnc\)i\c\-\Bnc\) 

BeiT,cea 

\B\\C\\ \B\\C\ 



\u\ J y ' \u\ 



= ^ E \B\m~\C\)\C\i\U\-\B\ 

= T^T.\B\\U-B\J2\C\\U-C\ 
1^1 BeiT ceo- 

-^|dit(7r)||dit (a)|. 



\u\ 

Hence under independence, the normalized dit count m(7r, cr) ~ = '^\u[^'^ '^\u[^^ ~ ^ ^""-^ ^ ^'^^ 

of the mutual information set Mut (tt, cr) = dit (tt) n dit (ct) is equal to product of the normalized dit 
counts of the partitions: 

m{'!T, a) = h (tt) h (cr) if tt and a are independent. ■ 
4.2 Mutual Information 

For each of the major concepts in the information theory based on the usual Shannon measure, there 
should be a corresponding concept based on the normalized dit counts of logical entropy^ In the 
following sections, we give some of these corresponding concepts and results. 

The logical mutual information of two partitions m (tt, a) is the normalized dit count of the 
intersection of their dit-sets: 

For Shannon's notion of mutual information, we might apply the Venn diagram heuristics using a 
block B € TT and a block C G a. We saw before that the information contained in a block B was 
H (B) ~ log and similarly for C while H {B D C) ^ log (^j^^^ would correspond to the union 

of the information in B and in C. Hence the overlap or "mutual information" in B and C could be 
motivated as the sum of the two informations minus the union: 

I {B; C) = log (^) + log (^) - log (^) = log (^) + log {psnc) = log {^) . 

Then the (Shannon) mutual information in the two partitions is obtained by averaging over the 
mutual information for each pair of blocks from the two partitions: 



^*See Cover and Thomas' book |6] for more background on the standard concepts. The corresponding notions for 
the block-count entropy are obtained from the usual Shannon entropy notions by taking antilogs. 
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The mutual information can be expanded to verify the Venn diagram heuristics: 

T,B,cPBnclog {pBnc) + Es.cPsnclog (^) + Es.c^'Bnclog (^) 
= -H{7rVa)+ Ebg. Pb log (^) + Ecea PC iog = H (n) + H (a) - if (tt V a) . 

We will later see an important inequality, I {-K-.a) > (with equality under independence), and its 
logical version. 

In the logical theory, the corresponding "modular law" follows from the inclusion-exclusion 
principle applied to dit-sets: |dit (tt) fl dit (cr)| = |dit (7r)| + |dit {a) \ — |dit (tt) U dit (f7)|. Normalizing 
yields: 

m in, a) = I'iiH-)^']-^^)! = MiM^ + _ |dit(.)udit(.)| = /, (^) + (^) _ /, v a). 

Since the formulas concerning the logical and Shannon entropies often have similar relationships, 
e.g., / (tt; a) = H (w) + H (a) — H (ttW a) and m (tt, a) = h{Tr) + h (a) — /i (tt V ct) , it is useful to also 
emphasize some crucial difFcrcnccs. One of the most important special cases is for two partitions 
that arc (stochastically) independent. For independent partitions, it is immediate that / (tt; a) = 

Eb c'PBnc log ( p^pc ) = '^^ have already seen that for the logical mutual information, 

m (tt, cr) > so long as neither partition is the blob 0. However for independent partitions we have; 

m (tt, a) = h (tt) h (cr) 

so the logical mutual information behaves like the probability of both events occurring in the case of 
independence (as it must since logical entropy concepts have direct probabilistic interpretations). For 
independent partitions, the relation m (tt, a) = h (tt) h (fx) means that the probability that a random 
pair is distinguished by both partitions is the same as the probability that it is distinguished by one 
partition times the probability that it is distinguished by the other partition. In simpler terms, for 
independent tt and a, the probability that tt and a distinguishes is the probability that tt distinguishes 
times the probability that a distinguishes. 

It is sometimes convenient to think in the complementary terms of an equivalence relation 
"identifying." rather than a partition distinguishing. Since h (tt) can be interpreted as the probability 
that a random pair of elements from U are distinguished by tt, i.e., as a distinction probability, its 
complement 1 — h{n) can be interpreted as an identification probability, i.e., the probability that a 
random pair is identified by n (thinking of tt as an equivalence relation on U). In general, 

[1-h (tt)] [1-h (a)] = 1 - hiTr) - h{a) + h{Tr)h (<t) = [1 - /i (tt V <t)] + [h (tt) h {a) - m(7r, a] 

which could also be rewritten as: 

[1 - /i (tt V ct)] - [1 - /i (tt)] [1 - /i (a)] = m(7r, a) - h (tt) h {a). 

Hence: 

if TT and a are independent: [1 — h (it)] [1 — h (a)] = [1 — /i (tt V cr)] . 
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Thus if TT and <j are independent, then the probabihty that the join partition tt V cr identifies is 
the probabihty that tt identifies times the probabihty that a identifies. In summary, if tt and a are 
independent, then: 

Binary-partition-count (Shannon) entropy: H (tt V a) = H (tt) + H (cr) 

Block-count entropy: (tt V cr) = H„i (tt) (cr) 
Normahzed-dit-count (logical) entropy: h {tt W a) ^ 1 — [1 — h (tt)] [1 — ft, (cr)] . 

4.3 Cross Entropy and Divergence 

Given a set partition tt = {^Isg^ on set U, the "natural" or Laplacian probability distribution on 
the blocks of the partition was pb — jijj- The set partition tt also determines the set of distinctions 
dit (tt) C U X U and the logical entropy of the partition was the Laplacian probability of the dit-set 
as an event, i.e., ft (tt) = j^^'^^^j — ^gPB (1 —ps)- But we may also "kick away the ladder" and 
generalize all the definitions to any finite probability distributions p = {pi,...,p„}. A probability 
distribution p might be given by finite-valued random variables X on a sample space U where 
Pi = Prob(X = Xj) for the finite set of distinct values Xi for i — I, n. Thus the logical entropy of 
the random variable X is: ft(^) = jyi=iPi (1 ~ Pi) — 1 ~ J2iPi- The entropy is only a function of 
the probability distribution of the random variable, not its values, so we could also take it simply as 
a function of the probability distribution p, h (p) = 1 — pf. Taking the sample space as {1, n}, 
the logical entropy is still interpreted as the probability that two independent draws will draw 
distinct points from The further generalizations replacing probabilities by probability 

density functions and sums by integrals are straightforward but beyond the scope of this paper 
(which is focused on conceptual foundations rather than mathematical developments). 

Given two probability distributions p = {pi,...,p„} and q — {qi, on the same sample 
space {1, we can again consider the drawing of a pair of points but where the first drawing 

is according to p and the second drawing according to q. The probability that the pair of points is 
distinct would be a natural and more general notion of logical entropy which we will call the: 

logical cross entropy: h {p\\q) = J^iPii^ - ^0 = 1 - J2iPi1i = Ei *(1 - Pi) = h {q\\p) 

which is symmetric. The logical cross entropy is the same as the logical entropy when the distributions 
are the same, i.e.. 



which is not symmetrical due to the asymmetric role of the logarithm, although ii p — q^ then 
H (pII?) — H {p). Then the Kullhack-Leibler divergence D {p\\q) = 'YliPi -^^g (f^) defined as a mea- 
sure of the distance or divergence between the two distributions where D {p\\q) = H {p\\q) — H (p). 
The information inequality is: D (j)\\q) > with equality if and only if p,; = qi for i — I, n [6, p. 26]. 
Given two partitions tt and cr, the inequality / (tt; cr) > is obtained by applying the information in- 
equality to the two distributions {psnc} and {pbPc} on the sample space {{B, C) : B G tt^C G a} = 
TT X a: 



I (t^; = Y.B,cPBnc log ( ^3 ) = D {{pBnc} II {PBPc}) > with equality under independence. 



But starting afresh, one might ask: "What is the natural measure of the difference or distance 
between two probability distributions p — {pi, ...,Pn} and q = {qi,---,qn} that would always be 



if p = g, then ft {p\\q) ~ ft (p). 



The notion of cross entropy in conventional information theory is: H {p\\q) — X^i-Pi^og 
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non-negative, and would be zero if and only they are equal?" The (Euclidean) distance between the 
two points in M" would seem to be the "logical" answer — so we take that distance (squared) as the 
definition of the: 

logical divergence (or logical relative entropy): d{p\\q) ~ {pi — qif' , 
which is symmetric and non-negative. We have component-wise: 

< (p. - q^f =p\- 2p,q, + = 2 [i - p,,,] - [i - pj] - [i - qi] 
SO that taking the sum for i = 1, n gives: 

<d(p|k) = E.(p« --?.)' = 2 [1 - - - [i-E,?!] =2/i(p|k)-/i(p)-/i(g). 

Thus we have the: 

Q < d {p\\q) = 2h {p\\q) — h{p) — h {q) with equality if and only if pi = qi for i = 1, n 

Logical information inequality. 

If we take h {p\\q) — ■^[h{p) + h (<?)] as the Jensen difference [IHl P- 25] between the two distributions, 
then the logical divergence is twice the Jensen difference. The half-and-half probability distribution 
that mixes p and q has the logical entropy of h (^^) = '^'^^^^'''^ + ^'(p)+^'(g) gp that: 

d ip\\q) = 4 [/i (£±2) - i {h [p) + h (g)}] > 0. 

The logical information inequality tells us that "mixing increases logical entropy" (or, to be precise, 
mixing does not decrease logical entropy) which also follows from the fact that logical entropy 
h{p) = 1 — pI is a concave function. 

An important special case of the logical information inequality is when p — {pi, ...,p„} is the 
uniform distribution with all pi = ^. Then h{p) = 1 — ^ where the probability that a random pair 
is distinguished (i.e., the random variable X with Prob(X = Xi) — Pi has different values in two 
independent samples) takes the specific form of the probability 1 — ^ that the second draw gets a 
different value than the first. It may at first seem counterintuitive that in this case the cross entropy 
is h{p\\q) = h{p)+ J2^Pt (Pi -qi) = h (p) +J2i^{^-qi)=h{p)^l-^ for any q = {qi, ...,g„}. 
But /i(p||(7) is the probability that the two points, say i and i', in the sample space {l,...,n} are 
distinct when one draw was according to p and the other according to q. Taking the first draw 
according to q, the probability that the second draw is distinct from whatever point was determined 
in the first draw is indeed 1 — ^ (regardless of probability qi of the point drawn on the first draw) . 
Then the divergence d{p\\q) = 2h {p\\q) — h {p) — h [q) ~ (l — ^) — h[q) is a non-negative measure 
of how much the probability distribution q diverges from the uniform distribution. It is simply the 
difference in the probability that a random pair will be distinguished by the uniform distribution 
and by q. Also since < d (p||(z), this shows that among all probability distributions on {1, rt}, the 
uniform distribution has the maximum logical entropy. In terms of partitions, the n-block partition 
with pb — ^ has maximum logical entropy among all n-block partitions. In the case of \U\ divisible 
by n, the equal n-block partitions make more distinctions than any of the unequal n-block partitions 
on U. 

For any partition tt with the n block probabilities {psI^eTr ~ i^*!' •■■'Pn}: 

h (tt) < 1 — ^ with equality if and only if pi = ... = p„ = ^. 

For the corresponding results in the Shannon's information theory, we can apply the information 
inequality D (p\\q) — H [p\\q) — H (p) > with q as the uniform distribution qi = ... — q^ = —. Then 

H {p\\q) — J2iPi log (tT") ~ ('^^ that: H{p) < log (n) or in terms of partitions: 
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H (tt) < log2 (|7r|) with equality if and only if the probabilities are equal 

or, in base-free terms, 

Hm (tt) < |7r| with equality if and only if the probabilities are equal. 

The three entropies take their maximum values (for fixed number of blocks |7r|) at the partitions 
with equiprobable blocks. 

In information theory texts, it is customary to graph the case of 71 = 2 where the entropy 
is graphed as a function of pi = p with p2 — \ ~ p. The Shannon entropy function H (p) = 
— plog (p) — (1 — p) log (1 — p) looks somewhat like an inverted parabola with its maximum value of 
log(n) = log (2) = 1 at p = .5. The logical entropy function h (p) = 1 — p^ — {1 — p) — 2p — 2p^ = 
2p (1 — p) is an inverted parabola with its maximum value of 1 — ^ = 1 — ^ .5 at p = .5. The 

p) (t^) ^ 2-^(p) is an inverted U-shaped curve that starts 
and ends at 1 = 2^^^^ = 2^^^^ and has its maximum at 2 = 2^(-^). 



4.4 Summary of Analogous Concepts and Results 





Shannon Entropy 


Logical Entropy 


Block Entropy 


H{B)=log{l/pB) 


h{B) = l-pB 


Relationship 






h [B) = 1 2H(B) 


Entropy 


H{tt) = J2PB^0g{l/pB) 


^(^) = Y.PB (1 -pb) 


Mutual Information 


/(tt; a) ^ H{tt) + H (ct) - iJ (tt V ct) 


m (tt, a) — h (tt) + h (a) — /i (tt V cr) 


Independence 


/(7r;cr) = 


m (tt, a) — h (tt) h (cr) 


Independence & Joins 


HiirVcr) ^ H (tt) + H {a) 


(vr V ct) = 1 - [1 - /i (tt)] [l-h [a)] 


Cross Entropy 


H{p\\q)=j:p,log{l/q,) 


h{p\\q)^Y.P'^{^-1i) 


Divergence 


D {p\\q) ^ H {p\\q) - H (p) 


d{p\\q)^2h{p\\q)^h{p)^h{q) 


Information Inequality 


D{p\\q) > with ^ iff 


= qi^i 


dipWo) ^ with — iS Pi — qtii 


Info. Incq. Sp. Case 


/ {tt; a)=D {{pBnc] II {pbPc}) > 
with equality under independence 


d{{pBnc} II {PBPc}) > 
with equality under independence. 



5 Concluding Remarks 

In the duality of subsets of a set with partitions on a set, we found that the elements of a subset were 
dual to the distinctions (dits) of a partition. Just as the finite probability theory for events started 
by taking the size of a subset ("event") S normalized to the size of the finite universe U as the 
probability Prob (S) = j^, so it would be natural to consider the corresponding theory that would 
associate with a partition tt on a finite U, the size |dit (7r)| of the set of distinctions of the partition 
normalized by the total number of ordered pairs \U x U\. This number h (tt) = i^*^^-*!^ was called 

the logical entropy of tt and could be interpreted as the probability that a randomly picked (with 

I s 

replacement) pair of elements from U is distinguished by the partition tt, just as Prob (s) = is 
the probability that a randomly picked element from U is an element of the subset S. Hence this 
notion of logical entropy arises naturally out of the logic of partitions that is dual to the usual logic 
of subsets. 

The question immediately arises of the relationship with Shannon's concept of entropy. Following 
Shannon's definition of entropy, there has been a veritable plethora of suggested alternative entropy 
concepts |20| . Logical entropy is not an alternative entropy concept intended to displace Shannon's 
concept any more than is the block-count entropy concept. Instead, I have argued that the dit- 
count, block-count, and binary-partition-count concepts of entropy should be seen as three ways to 
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measure that same "information" expressed in its most atomic terms as distinctions. The block-count 
entropy, ahhough it can be independently defined, is trivially related to Shannon's binary-partition- 
count concept — just take antilogs. The relationship of the logical concept of entropy to the Shannon 
concept is a little more subtle but is quite simple at the level of blocks _B G tt: h{B) = 1 — pb, 

Hm (B) — and H (B) ^ log (^^^ so that eliminating the probability, we have: 

^ Hm (B) 

1 

Then the logical and additive entropies for the whole partition are obtained by taking the (additive) 
expectation of the block entropies while the block-count entropy is the multiplicative expectation of 
the block entropies: 

V Pi3 log I ) 

J2 PB {I - pb) ■ 

In conclusion, the simple root of the matter is three different ways to "measure" the distinctions 
that generate an n-element set. Consider a 4 element set. One measure of the distinctions that 
distinguish a set of 4 elements is its cardinality 4, and that measure leads to the block-count entropy. 
Another measure of that set is log2 (4) = 2 which can be interpreted as the minimal number of binary 
partitions necessary: (a) to single out any designated element as a singleton (search interpretation) 
or, equivalently, (b) to distinguish all the elements from each other (distinction interpretation). That 
measure leads to Shannon's entropy formula. And the third measure is the (normalized) count of 
distinctions (counted as ordered pairs) necessary to distinguish all the elements from each other, i.e., 
"^^^""^ ~ tI ~ f ' "^hich yields the logical entropy formula. These measures stand in the block value 
relationship: | = 1 — -j^l — ^.Itis just a matter of: 

1. counting the elements distinguished (block-count entropy), 

2. counting the binary partitions needed to distinguish the elements (Shannon entropy), or 

3. counting the (normalized) distinctions themselves (logical entropy). 
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